About the Data

These data show air pollution at each census tract. They specifically focus on concentrations of PM2.5, meaning fine particulate matter that is less than 2.5 micrometers in diameter. PM2.5 concentrations are measured by the number of micrograms per cubic meter. High concentrations of PM2.5 indicate higher levels of air pollution.

The data show PM2.5 concentrations at every year from 1981-2016. However, we focus only on 1981 and 2016 and how the data changed from 1981 to 2016.

Variable Descriptions

glimpse(airquality)
## Rows: 50
## Columns: 9
## $ trtid10         <dbl> 51003010100, 51003010201, 51003010202, 51003010300, 51…
## $ pm2_5_1981      <dbl> 22.424961, 13.811016, 10.633014, 24.722347, 14.738005,…
## $ pm2_5_2016      <dbl> 6.137776, 3.962960, 3.051047, 7.119418, 4.167065, 2.79…
## $ percentile_1981 <dbl> 54.264661, 22.431369, 14.253857, 64.884069, 24.607330,…
## $ percentile_2016 <dbl> 38.138176, 17.884152, 10.742680, 52.455747, 19.704147,…
## $ STATEFP00       <dbl> 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51…
## $ COUNTYFP00      <chr> "003", "003", "003", "003", "003", "003", "003", "003"…
## $ PM_change       <dbl> -16.287185, -9.848057, -7.581967, -17.602929, -10.5709…
## $ pctile_change   <dbl> -16.126485, -4.547217, -3.511178, -12.428322, -4.90318…

Observations are census tract estimates of:

  • PM2.5 levels in 1981 and 2016 (pm2_5_1981 and pm2_5_2016)
  • Percentile rankings in 1981 and 2016 (percentile_1981 and percentile_2016)
    • Percentile rankings were calculated for the whole country and were not altered once the data were filtered to just the Charlottesville region.
  • Change in PM2.5 level between 1981 and 2016 (PM_change)
  • Change in percentile rank between 1981 and 2016 (pctile_change)

Summaries

Five-number summaries of all variables:

airquality %>% select(-c(trtid10, STATEFP00, COUNTYFP00)) %>% 
  select(where(~is.numeric(.x) && !is.na(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 0,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## ============================================
## Statistic       Mean St. Dev. Min Median Max
## --------------------------------------------
## pm2_5_1981       21     9      5    24   62 
## pm2_5_2016       6      3      1    7    17 
## percentile_1981  52     27     4    63   100
## percentile_2016  43     25     3    49   99 
## PM_change       -15     7     -45  -17   -4 
## pctile_change    -9     4     -16   -9   -0 
## --------------------------------------------

Visual Distributions

Visual representations of the data:

airquality %>% select(trtid10, percentile_1981, percentile_2016) %>% 
  pivot_longer(-trtid10, names_to = "measure", values_to = "value") %>% 
  ggplot(aes(x = value, fill = measure)) + 
  geom_histogram() + 
  facet_wrap(~measure, scales = "free") +
  xlim(0,100) +
  xlab("Percentile score") +
  scale_fill_discrete(labels = c("Percentile in 1981", "Percentile in 2016"))

meta %>% 
  filter(varname %in% c("percentile_1981", "percentile_2016")) %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "percentile_1981: Nationwide percentile rank in 1981, on a scale of 0-100"
## [2] "percentile_2016: Nationwide percentile rank in 2016, on a scale of 0-100"

Percentile in 1981 vs. percentile in 2016

airquality %>% 
  ggplot() +
  geom_point(aes(x=percentile_1981, y=percentile_2016)) +
  xlim(0, 100) +
  ylim(0, 100) +
  geom_abline(intercept = 0, slope = 1, color = "red") +
  xlab("Percentile in 1981") +
  ylab("Percentile in 2016")

meta %>% 
  filter(varname %in% c("percentile_1981", "percentile_2016")) %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "percentile_1981: Nationwide percentile rank in 1981, on a scale of 0-100"
## [2] "percentile_2016: Nationwide percentile rank in 2016, on a scale of 0-100"

This scatterplot shows the relationship between a census tract’s percentile rank in 1981 and its percentile rank in 2016. The red line shows where the data would be if their percentiles in 1981 and 2016 were the same.

Percentiles vs. percentile change

airquality %>% 
  ggplot() +
  geom_point(aes(x=percentile_1981, y=pctile_change)) +
  xlim(0, 100) +
  xlab("Percentile in 1981") +
  ylab("Percentile change, 1981-2016") +
  ggtitle("1981 Percentile vs. Percentile Change") +
  theme(plot.title = element_text(hjust = 0.5))

meta %>% 
  filter(varname %in% c("percentile_1981", "pctile_change")) %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "percentile_1981: Nationwide percentile rank in 1981, on a scale of 0-100"                      
## [2] "pctile_change: Change in percentile rank from 1981 to 2016 (percentile_2016 - percentile_1981)"

This scatterplot shows the relationship between a census tract’s percentile in 1981 and its change in percentile from 1981 to 2016.

airquality %>% 
  ggplot() +
  geom_point(aes(x=percentile_2016, y=pctile_change)) +
  xlim(0, 100) +
  xlab("Percentile in 2016") +
  ylab("Percentile change, 1981-2016") +
  ggtitle("2016 Percentile vs. Percentile Change") +
  theme(plot.title = element_text(hjust = 0.5))

meta %>% 
  filter(varname %in% c("percentile_2016", "pctile_change")) %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "percentile_2016: Nationwide percentile rank in 2016, on a scale of 0-100"                      
## [2] "pctile_change: Change in percentile rank from 1981 to 2016 (percentile_2016 - percentile_1981)"

This scatterplot shows the relationship between a census tract’s percentile in 2016 and its change in percentile from 1981 to 2016.

Spatial Distributions

Percentile in 1981

pal <- colorNumeric("Blues", reverse = TRUE, domain = cvilleshapes$percentile_1981)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(percentile_1981),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$GEOID, "<br>",
                             "Percentile: ", round(cvilleshapes$percentile_1981, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$percentile_1981,
            title = "Percentile, 1981", opacity = 0.7)
meta %>% 
  filter(varname=="percentile_1981") %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "percentile_1981: Nationwide percentile rank in 1981, on a scale of 0-100"

Percentile in 2016

pal <- colorNumeric("Blues", reverse = TRUE, domain = cvilleshapes$percentile_2016)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(percentile_2016),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$GEOID, "<br>",
                             "Percentile: ", round(cvilleshapes$percentile_2016, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$percentile_2016,
            title = "Percentile, 2016", opacity = 0.7)
meta %>% 
  filter(varname=="percentile_2016") %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "percentile_2016: Nationwide percentile rank in 2016, on a scale of 0-100"

Percentile Change, 1981-2016

pal <- colorNumeric("Blues", reverse = TRUE, domain = cvilleshapes$pctile_change)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(pctile_change),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$GEOID, "<br>",
                             "Percentile Change: ", round(cvilleshapes$pctile_change, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$pctile_change,
            title = "Percentile Change, 1981-2016", opacity = 0.7)
meta %>% 
  filter(varname=="pctile_change") %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "pctile_change: Change in percentile rank from 1981 to 2016 (percentile_2016 - percentile_1981)"

PM2.5 Concentration in 2016

pal <- colorNumeric("Blues", reverse = TRUE, domain = cvilleshapes$pm2_5_2016)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(pm2_5_2016),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$GEOID, "<br>",
                             "Concentration: ", cvilleshapes$pm2_5_2016)) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$pm2_5_2016,
            title = "PM2.5 Concentration, 2016", opacity = 0.7)
meta %>% 
  filter(varname=="pm2_5_2016") %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "pm2_5_2016: Concentration of PM2.5 in 2016"

Change in PM2.5, 1981-2016

pal <- colorNumeric("Blues", reverse = TRUE, domain = cvilleshapes$PM_change)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(PM_change),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$GEOID, "<br>",
                             "PM2.5 Change: ", round(cvilleshapes$PM_change, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$PM_change,
            title = "Change in PM2.5, 1981-2016", opacity = 0.7)
meta %>% 
  filter(varname=="PM_change") %>%
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
## [1] "PM_change: Change in the concentration of PM2.5 from 1981 to 2016 (pm2_5_2016 - pm2_5_1981)"

Important Notes

The original data uses 2000 census tracts, since that is roughly the midpoint of their 1981-2016 time frame. Since we want to be able to look at this in relation to other tract-level data, we needed this in 2010 census tracts. To estimate the data for 2010 census tracts, I used the bridging data found here.

Since the conversion to 2010 census tracts involved some estimation, the PM2.5 levels in the data may not be as accurate as the original 2000 data.